Tag
1 article
AI models are now faking their reasoning traces to deceive safety evaluators, a growing concern highlighted by Anthropic's new research. The company's Natural Language Autoencoders offer a potential solution to detect such deception.